Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Sci Rep ; 14(1): 5841, 2024 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-38462648

RESUMO

Cancer presents a significant global health burden, resulting in millions of annual deaths. Timely detection is critical for improving survival rates, offering a crucial window for timely medical interventions. Liquid biopsy, analyzing genetic variations, and mutations in circulating cell-free, circulating tumor DNA (cfDNA/ctDNA) or molecular biomarkers, has emerged as a tool for early detection. This study focuses on cancer detection using mutations in plasma cfDNA/ctDNA and protein biomarker concentrations. The proposed system initially calculates the correlation coefficient to identify correlated features, while mutual information assesses each feature's relevance to the target variable, eliminating redundant features to improve efficiency. The eXtrem Gradient Boosting (XGBoost) feature importance method iteratively selects the top ten features, resulting in a 60% dataset dimensionality reduction. The Light Gradient Boosting Machine (LGBM) model is employed for classification, optimizing its performance through a random search for hyper-parameters. Final predictions are obtained by ensembling LGBM models from tenfold cross-validation, weighted by their respective balanced accuracy, and averaged to get final predictions. Applying this methodology, the proposed system achieves 99.45% accuracy and 99.95% AUC for detecting the presence of cancer while achieving 93.94% accuracy and 97.81% AUC for cancer-type classification. Our methodology leads to enhanced healthcare outcomes for cancer patients.


Assuntos
Ácidos Nucleicos Livres , Neoplasias , Humanos , Biópsia Líquida/métodos , Ácidos Nucleicos Livres/genética , Neoplasias/diagnóstico , Neoplasias/genética , DNA de Neoplasias , Aprendizado de Máquina
2.
BMC Bioinformatics ; 25(1): 61, 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38321434

RESUMO

BACKGROUND: The rapid advancement of next-generation sequencing (NGS) machines in terms of speed and affordability has led to the generation of a massive amount of biological data at the expense of data quality as errors become more prevalent. This introduces the need to utilize different approaches to detect and filtrate errors, and data quality assurance is moved from the hardware space to the software preprocessing stages. RESULTS: We introduce MAC-ErrorReads, a novel Machine learning-Assisted Classifier designed for filtering Erroneous NGS Reads. MAC-ErrorReads transforms the erroneous NGS read filtration process into a robust binary classification task, employing five supervised machine learning algorithms. These models are trained on features extracted through the computation of Term Frequency-Inverse Document Frequency (TF_IDF) values from various datasets such as E. coli, GAGE S. aureus, H. Chr14, Arabidopsis thaliana Chr1 and Metriaclima zebra. Notably, Naive Bayes demonstrated robust performance across various datasets, displaying high accuracy, precision, recall, F1-score, MCC, and ROC values. The MAC-ErrorReads NB model accurately classified S. aureus reads, surpassing most error correction tools with a 38.69% alignment rate. For H. Chr14, tools like Lighter, Karect, CARE, Pollux, and MAC-ErrorReads showed rates above 99%. BFC and RECKONER exceeded 98%, while Fiona had 95.78%. For the Arabidopsis thaliana Chr1, Pollux, Karect, RECKONER, and MAC-ErrorReads demonstrated good alignment rates of 92.62%, 91.80%, 91.78%, and 90.87%, respectively. For the Metriaclima zebra, Pollux achieved a high alignment rate of 91.23%, despite having the lowest number of mapped reads. MAC-ErrorReads, Karect, and RECKONER demonstrated good alignment rates of 83.76%, 83.71%, and 83.67%, respectively, while also producing reasonable numbers of mapped reads to the reference genome. CONCLUSIONS: This study demonstrates that machine learning approaches for filtering NGS reads effectively identify and retain the most accurate reads, significantly enhancing assembly quality and genomic coverage. The integration of genomics and artificial intelligence through machine learning algorithms holds promise for enhancing NGS data quality, advancing downstream data analysis accuracy, and opening new opportunities in genetics, genomics, and personalized medicine research.


Assuntos
Arabidopsis , Inteligência Artificial , Teorema de Bayes , Escherichia coli , Staphylococcus aureus , Software , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Aprendizado de Máquina , Análise de Sequência de DNA
3.
Sci Rep ; 13(1): 19892, 2023 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-37963976

RESUMO

Concrete is a cost-effective construction material widely used in various building infrastructure projects. High-performance concrete, characterized by strength and durability, is crucial for structures that must withstand heavy loads and extreme weather conditions. Accurate prediction of concrete strength under different mixtures and loading conditions is essential for optimizing performance, reducing costs, and enhancing safety. Recent advancements in machine learning offer solutions to challenges in structural engineering, including concrete strength prediction. This paper evaluated the performance of eight popular machine learning models, encompassing regression methods such as Linear, Ridge, and LASSO, as well as tree-based models like Decision Trees, Random Forests, XGBoost, SVM, and ANN. The assessment was conducted using a standard dataset comprising 1030 concrete samples. Our experimental results demonstrated that ensemble learning techniques, notably XGBoost, outperformed other algorithms with an R-Square (R2) of 0.91 and a Root Mean Squared Error (RMSE) of 4.37. Additionally, we employed the SHAP (SHapley Additive exPlanations) technique to analyze the XGBoost model, providing civil engineers with insights to make informed decisions regarding concrete mix design and construction practices.

4.
Sci Rep ; 13(1): 7961, 2023 05 17.
Artigo em Inglês | MEDLINE | ID: mdl-37198193

RESUMO

Eye-based communication languages such as Blink-To-Speak play a key role in expressing the needs and emotions of patients with motor neuron disorders. Most invented eye-based tracking systems are complex and not affordable in low-income countries. Blink-To-Live is an eye-tracking system based on a modified Blink-To-Speak language and computer vision for patients with speech impairments. A mobile phone camera tracks the patient's eyes by sending real-time video frames to computer vision modules for facial landmarks detection, eye identification and tracking. There are four defined key alphabets in the Blink-To-Live eye-based communication language: Left, Right, Up, and Blink. These eye gestures encode more than 60 daily life commands expressed by a sequence of three eye movement states. Once the eye gestures encoded sentences are generated, the translation module will display the phrases in the patient's native speech on the phone screen, and the synthesized voice can be heard. A prototype of the Blink-To-Live system is evaluated using normal cases with different demographic characteristics. Unlike the other sensor-based eye-tracking systems, Blink-To-Live is simple, flexible, and cost-efficient, with no dependency on specific software or hardware requirements. The software and its source are available from the GitHub repository ( https://github.com/ZW01f/Blink-To-Live ).


Assuntos
Piscadela , Fala , Humanos , Olho , Movimentos Oculares , Software , Distúrbios da Fala
5.
PLoS One ; 13(5): e0196707, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29723232

RESUMO

The kidney exchange programs bring new insights in the field of organ transplantation. They make the previously not allowed surgery of incompatible patient-donor pairs easier to be performed on a large scale. Mathematically, the kidney exchange is an optimization problem for the number of possible exchanges among the incompatible pairs in a given pool. Also, the optimization modeling should consider the expected quality-adjusted life of transplant candidates and the shortage of computational and operational hospital resources. In this article, we introduce a bio-inspired stochastic-based Ant Lion Optimization, ALO, algorithm to the kidney exchange space to maximize the number of feasible cycles and chains among the pool pairs. Ant Lion Optimizer-based program achieves comparable kidney exchange results to the deterministic-based approaches like integer programming. Also, ALO outperforms other stochastic-based methods such as Genetic Algorithm in terms of the efficient usage of computational resources and the quantity of resulting exchanges. Ant Lion Optimization algorithm can be adopted easily for on-line exchanges and the integration of weights for hard-to-match patients, which will improve the future decisions of kidney exchange programs. A reference implementation for ALO algorithm for kidney exchanges is written in MATLAB and is GPL licensed. It is available as free open-source software from: https://github.com/SaraEl-Metwally/ALO_algorithm_for_Kidney_Exchanges.


Assuntos
Algoritmos , Histocompatibilidade , Transplante de Rim , Doadores Vivos/provisão & distribuição , Software , Obtenção de Tecidos e Órgãos/métodos , Sistema ABO de Grupos Sanguíneos/imunologia , Antígenos HLA/imunologia , Teste de Histocompatibilidade , Humanos , Falência Renal Crônica/cirurgia , Processos Estocásticos , Obtenção de Tecidos e Órgãos/organização & administração
6.
Bioinformatics ; 32(21): 3215-3223, 2016 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-27412092

RESUMO

MOTIVATION: The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. RESULTS: LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. AVAILABILITY AND IMPLEMENTATION: https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Genoma , Genômica , Humanos , Análise de Sequência de DNA
7.
PLoS Comput Biol ; 9(12): e1003345, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24348224

RESUMO

Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.


Assuntos
DNA/química , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Genoma , Alinhamento de Sequência , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...